Sentence Filtering for Information Extraction in Genomics, a Classification Problem

نویسندگان

  • Claire Nedellec
  • Mohamed Ould Abdel Vetah
  • Philippe Bessières
چکیده

In some domains, Information Extraction (IE) from texts requires syntactic and semantic parsing. This analysis is computationally expensive and IE is potentially noisy if it applies to the whole set of documents when the relevant information is sparse. A preprocessing phase that selects the fragments which are potentially relevant increases the efficiency of the IE process. This phase has to be fast and based on a shallow description of the texts. We applied various classification methods — IVI, a Naive Bayes learner and C4.5 — to this fragment filtering task in the domain of functional genomics. This paper describes the results of this study. We show that the IVI and Naive Bayes methods with feature selection gives the best results as compared with their results without feature selection and with C4.5 results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

انجام یک مرحله پیش پردازش قبل از مرحله استخراج ویژگی در طبقه بندی داده های تصاویر ابر طیفی

Hyperspectral data potentially contain more information than multispectral data because of their higher spectral resolution. However, the stochastic data analysis approaches that have been successfully applied to multispectral data are not as effective for hyperspectral data as well. Various investigations indicate that the key problem that causes poor performance in the stochastic approaches t...

متن کامل

Sentence-based Text Summarization : Modelling and Evaluation

This paper addresses the problem of creating a summary by extracting a set of sentences that are likely to represent the content of a document. A small scale experiment is conducted leading to the compilation of an evaluation corpus for the Greek language. Three models of sentence extraction are then described, along the lines of shallow linguistic analysis, feature combination and machine lear...

متن کامل

Using Profile Matching and Text Categorization for Answer Extraction in TREC Genomics

TREC’06 genomics track was focusing on text mining and passage retrieval. WIM lab participated in this year’s TREC genomics track. Our system consists of five parts: preprocessing, sentence generation, document retrieval, answer extraction and answer fusion. And we developed two different method: a automated profile matchingbased method and a text categorizationbased method to do the text minin...

متن کامل

Circular Mean Filtering For Textures Noise Reduction

In this paper, a special preprocessing operations (filter) is proposed to decrease the effects of noise of textures. This filter using average of circular neighbor points (Cmean) to reduce noise effect. Comparing this filter with other average filters such as square mean filter and square median filter indicates that it provides more noise reduction and increases the classification accuracy...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001